Constructing a Class-Based Lexical Dictionary using Interactive Topic Models

نویسندگان

  • Kugatsu Sadamitsu
  • Kuniko Saito
  • Kenji Imamura
  • Yoshihiro Matsuo
چکیده

This paper proposes a new method of constructing arbitrary class-based related word dictionaries on interactive topic models; we assume that each class is described by a topic. We propose a new semi-supervised method that uses the simplest topic model yielded by the standard EM algorithm; model calculation is very rapid. Furthermore our approach allows a dictionary to be modified interactively and the final dictionary has a hierarchical structure. This paper makes three contributions. First, it proposes a word-based semi-supervised topic model. Second, we apply the semi-supervised topic model to interactive learning; this approach is called the Interactive Topic Model. Third, we propose a score function; it extracts the related words that occupy the middle layer of the hierarchical structure. Experiments show that our method can appropriately retrieve the words belonging to an arbitrary class.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Face Detection Method Based on Over-complete Incoherent Dictionary Learning

In this paper, face detection problem is considered using the concepts of compressive sensing technique. This technique includes dictionary learning procedure and sparse coding method to represent the structural content of input images. In the proposed method, dictionaries are learned in such a way that the trained models have the least degree of coherence to each other. The novelty of the prop...

متن کامل

A protocol for constructing a domain-specific ontology for use in biomedical information extraction using lexical-chaining analysis

In order to do more semantics-based information extraction, we require specialized domain models. We develop a hybrid approach for constructing such a domain-specific ontology, which integrates key concepts from the protein-protein– interaction domain with the Gene Ontology. In addition, we present a method for using the domain-specific ontology in a discourse-based analysis module for analyzin...

متن کامل

Creating Lexical Resources for Endangered Languages

This paper examines approaches to generate lexical resources for endangered languages. Our algorithms construct bilingual dictionaries and multilingual thesauruses using public Wordnets and a machine translator (MT). Since our work relies on only one bilingual dictionary between an endangered language and an “intermediate helper” language, it is applicable to languages that lack many existing r...

متن کامل

Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions

In this paper, we propose a novel topic model based on incorporating dictionary definitions. Traditional topic models treat words as surface strings without assuming predefined knowledge about word meaning. They infer topics only by observing surface word co-occurrence. However, the co-occurred words may not be semantically related in a manner that is relevant for topic coherence. Exploiting di...

متن کامل

A Reversible and Reusable Morpho-Lexical Description of Romanian

Constructing a natural language dictionary and/or a grammar for computational use is a farreaching project, requiring very important human and material resources. Generalisation of the lexical approaches in natural language modelling confers an essential role to the dictionary in every system for automatic natural language processing. More and more information, which was traditionally encoded b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012